Generate TV Scripts

All Required Files and Tests

Criteria	Meet Specification
All project files are included in the submission	The project submission contains the project notebook, called “dlnd_tv_script_generation.ipynb”.
All unit tests in the project have passed	All the unit tests in project have passed.

Criteria Meet Specification

Criteria	Meet Specification
The function `create_lookup_tables` is implemented	The function `create_lookup_tables` create two dictionaries: Dictionary to go from the words to an id, we'll call vocab_to_int Dictionary to go from the id to word, we'll call int_to_vocab The function `create_lookup_tables` return these dictionaries as a tuple (vocab_to_int, int_to_vocab).
A special token dictionary is created	The function `token_lookup` returns a dict that can correctly tokenizes the provided symbols.

The function create_lookup_tables is implemented

The function create_lookup_tables create two dictionaries:

The function create_lookup_tables return these dictionaries as a tuple (vocab_to_int, int_to_vocab).

A special token dictionary is created

The function token_lookup returns a dict that can correctly tokenizes the provided symbols.

Criteria	Meet Specification
Data is broken into sequences	The function `batch_data` breaks up word id's into the appropriate sequence lengths, such that only complete sequence lengths are constructed.
Data is created using TensorDataset	In the function `batch_data`, data is converted into Tensors and formatted with TensorDataset.
Data is batched correctly.	Finally, `batch_data` returns a DataLoader for the batched training data.

Criteria	Meet Specification
An RNN class has been defined	The RNN class has complete `__init__`, `forward` , and `init_hidden` functions.
The RNN includes at least one LSTM (or GRU) and fully-connected layer.	The RNN must include an LSTM or GRU and at least one fully-connected layer. The LSTM/GRU should be correctly initialized, where relevant.

Criteria	Meet Specification
Reasonable hyperparameters are selected for training	Enough epochs to get near a minimum in the training loss, no real upper limit on this. Just need to make sure the training loss is low and not improving much with more training. Batch size is large enough to train efficiently, but small enough to fit the data in memory. No real “best” value here, depends on GPU memory usually. Embedding dimension, significantly smaller than the size of the vocabulary, if you choose to use word embeddings Hidden dimension (number of units in the hidden layers of the RNN) is large enough to fit the data well. Again, no real “best” value. n_layers (number of layers in a GRU/LSTM) is between 1-3. The sequence length (seq_length) here should be about the size of the length of sentences you want to look at before you generate the next word. The learning rate shouldn’t be too large because the training algorithm won’t converge. But needs to be large enough that training doesn’t take forever.
The model shows improvement during training	The printed loss should decrease during training. The loss should reach a value lower than 3.5.
Question about hyperparameter choices is answered.	There is a provided answer that justifies choices about model size, sequence length, and other parameters.

Criteria	Meet Specification
The generator code generates a script	The generated script can vary in length, and should look structurally similar to the TV script in the dataset. It doesn’t have to be grammatically correct or make sense.

Criteria

Meet Specification

The generator code generates a script

The generated script can vary in length, and should look structurally similar to the TV script in the dataset.

It doesn’t have to be grammatically correct or make sense.

Use validation data to choose the best model
Initialize your model weights, especially the weights of the embedded layer to encourage model convergence
Use topk sampling to generate new words
Check out the "Advanced projects" section in the project overview to take this work even further!